OCR is a technology that enables conversion of scanned or photographed images of typewritten text into machine-editable and searchable text.
Scanning or photographing a document page from a thick bound volume often results in different distortions on the image, e.g. distorted text-lines in the area of the spine of the book. FIG. 1 shows a scanned image corresponding to a page of a book. It will be seen that the area indicated by reference numeral 101 contains some geometric distortion or warping.
This distortion may be caused by book pages not being in uniform intimate contact with the scanning surface or platen surface of a scanner. For example, portions of book pages that are near the spine of the book are usually the portions that are not in intimate contact with the platen surface. Accordingly, distortion occurs in image parts corresponding to these portions. These distortions prevent the correct recognition of words located in close proximity to the binding edge of a book.